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In troduction 

This paper describes a computer program xdiich accepts and “understands' 1 
a comfortable, but restricted sot of one natural language, English. Certain 
difficulties are inherent in this problem of making a machine “understand" 
English*. Within the limited framework of the subject matter understood by 
the program, many of theses problems ere sclvid or circumvented. I shall des¬ 
cribe these problems and my solutions, and point out those solutions which 
l feel have general applicability. 5 will also indicate which must be 
replaced by mors general methods to be really useful, and give my ideas about 
what general solutions to these particular problems might entail. 

1 shall not bore the reader at this point with a diatribe on why one 

would want to communicate to the computer in English- Suffice it to say 

that 200 million English speaking people can 1 ! be all wrong=~and if they 

could speak to a computer they might even be right more often. Kan's 

ability to use symbols and language is a prime factor in his intelligence, 

and t&en we learn how to make a computer understand any natural language, 

we will have taken a large step toward creating an "artificially intelligent" 

machine This is not to say that using "natural language" is necessary; one 

might do even better to make people change to some more "intelligent" language. 

*To avoid excessive circumlocutions, I shall henceforth use just "English" 
instead of the hedging phrase "restricted subset of English’, and use "under** 
stand" only in the sense defined below. 
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the question naturally arises, "tifoat dc you mean by having a computer 
under 3 tar.d natural language;" 1 have adopted the following operational 

definition of understanding, A computer understands a subset of English 
if it will accept input sentences which are members of this subset, and 
correctly answer questions based ovi information contained in those sentences 
This ability mist extend to deductions bared on implicit information con¬ 
tained in several sentences. It is desirable that the savers also be 
in English to facilitate communication between the computer and a person. 

He thus define "understanding" in terns of statements in English. The 
computer must accept them as input, and answer certain queries about them. Eow 
should the computer store the information contained in these statements? 
vf each sentence could be stored unchangec, no information would be lost, 
but this would put a tremendous burden on the question-answering portion 
of the program. The question answerer would have to find all relevant sen” 
tesicsc, extract the “meaning' 5 pertinent tc> the question asked, and perform 
those deductions and manipulations necessary to find the answer to the 
question ashed. For a large corpus, sorting out the relevant material 
would be a very costly task 

One W 3 y oi easing the burden of sorting is to create an index for the 
input corpus. However, unless "meaning" is first extracted from the sentences, 
the index must be based upon the words in a sentence* The value of the iciex 
is then somewhat denigrated by the problems of cynemoray and homography, 

In s. general quest ion- answering system, each type of question may 
require that meaning be extracted in a different way for convenient mani¬ 
pulation and deduction. Deductive techniques may differ depending on the 
type of question and the information available in the corpus. To simplify 


t 
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such problems of sorting for relevant material* extracting meaning, and 
making inferences, old is driven to select t ’'point 5f Vi§l/ ! , SBd dlllfflit 

pll of (p3tion§ |ny¥|r|bli b^r a spcei^ 

te ixifli of I piOfei iWfi&l Opt® »!ifh 8 peirt vies m to 

SAD SAH program written by Robert Lindsay at Carnegie Tech in I960* It 
accepted as input most sentences which could be written in Basic English 
,f >a subset of English, designed by C. K> Ogden, which contains a vocabulary 
of about 1500 words). The questions which it answers are concerned with 

family relationships between individuals, "28 TOffl the brother Of 

Hary?" or "Who are Jack c g grandchildren?" SAD SM extract.^ the meaning 

of a sentence, relevant to family relationship, and stores only that informs 0 

tion. Thus from the sentence, "Mery, Tern’s sister, went to the meeting," 
the information about where Mary went would be discarded. The program stores 
in a family tree type of representation, the information about Mary and 
Tom's relationship, ice,, it makes them both children of the some (as yet 
unspecified) pair of parents. The family tiee grows as more information 
is added to the system. To answer a question concerning the relationship 
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course, An example is; 

"The sum of two numbers la 96, ant one of the numbers is 16 
larger than the other number. Find the two numbers 
Exactly this statement of this problem has bean accepted by the STiEEFl 
program and the following solution printed cut; 

"One of the numbers is 56" 

S! The ether nimbor is 4G’ ! 

The details of how this is accomplished will be discuessd below. 

I chose this problem context for e number of reasons.. First there is 
a good form in which to store this type of information for later manipula* 
ticn, namely as algebraic equations. Secondly t 3 felt that there was a 
manageable subset of English in which many of these probleas would be 
expressible, and that this subset could be expanded incrementally. Fin¬ 
ally, there ere a large number of. "algebra stGry problems” available in 

first year high school text becks. 

Since the entire process from input processing to question answering va 
programmed, a measure of comparison with human performance if. available, to 
t-w* fiTimpn-rr nrocram answers moat questions that it can handle as fast 
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the type of information expected, The information storage structure used in 
STU9HNT falls into the category of what X call relational models. 

A relatio nal , m o del is defined by three things £ a set of objects, the 
relationships between these objects expressible in the model, and a language 
media for exhibiting the relationships that exist between particular objects, 
A relational model is useful fox a question answering system if there exist 
techniques which can fcane advantage of the relational language to ind 
implied, but not necessarily explicitly stated, relationships between objects 
of the model, 

Lindsay's program for answering questions about family relationships 
uses such a relational model, The objects in the model are people, and 
the basic relationship used in the model is the parent-child relationship. 

The media used for expressing the relationship between individuals is & 
tree of nodes,, with nodes representing individuals, and directed branches 
representing the parent-child relationship. This model is useful because 
all other family relationships can be defined in terms of this one basic 
relationship, end questions about the relationship between two individuals 
can be answered on the basis of a consputaticn on the path connecting these 

two individuals in the family tree, 

The ST*©Eilf question answering system also uses a relational model. 

The objects ir. the model, are words and phrases "naming* 1 numbers, or numbers 
with units attached. X call these objects "variables". The basic relation- 
§1&PS 61*8 Sh§ arithmetic relations of sum, difference, product, quotient, 
exponentiation and equality. The media for expressing the relationships 
between objects is a set of equations. The model is useful because well 
defined techniques exist for finding numerical values which satisfy sets 
of simultaneous equations. Thus the system can answer questions about the 
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value of a number named by a given phrase, although this value is given only 
illicitly in the relationships stated in the problem, 

Por any relational model used in an "English language" question answer¬ 
ing system, there are two important considerations * The first is how can 
the English input be transformed into the relational language media, an: 
the second is what are good deductive techniques for using the model to 
solve problems. For algebra story problems a seed general format for a 
relational model was known, based on sets of simultaneous equations. The 
implementation within the STUDENT program of the transformation and solu¬ 
tion procedures, based on this general model, are discussed below. 

The notation Us ed in STIiSEaPfg Relational M odel 

The relational model in the STUDENT system uses a set of algebraic 
equations to represent the arithmetic relationships expressed in the English 
input. These equations are expressed in a pnrenthesined prefix notation 
rather than the conventional infix notation. For example, the conventions? 
infix notation expression, B -i- C, is written (PUIS B C). 

In general, in this prefix notation, Che name of the arithmetic function 
used is made the first element of a list, and succeeding list elements are 
the variables which are the arguments of that function. The exact notation 
used is given in Figure 1 below. Note that "minus" is a unary minus, and 
that the usual binary subtraction operator is a composite relation in the model. 
In addition, ’’plus" and "times" are not strictly binary. Indeed, In the 

model they may have an indefinite number or arguments, e.g.» {TIMES,A»B S C; 
is a legitimate prefix notation expression in the STUDENT model. 
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Ooeration 

infix Notation 

Prefix Notation 

Equality 

A * B 

(EQUAL A B) 

Addition 

Al 8 

(purs A B> 

Negation 

“A 

(MINUS A * 

Subtraction 

A =■ B 

(PUIS A (MINUS B) 

Multiplication 

A * B 

CXiMES A B) 

Division 

A t B 

CQDOTTENT A B) 

Exponentiation 

a e 

Figure 5 

(E3TT A B> 


The use of a fully parenthesised notation such as this circumvents the 
problem of ambiguity in the crde- in which op’rations occur. In the express 
sion A 4- 3 * C in uaparanthesized infix notation- it is unclear whether A is 
to be added to B and the stmt multiplied by C, or if the product of B and Ci 
is to be added to A* Cat* solution to this ambiguity is to give each opera¬ 
tion t relative precedence, and operations of higher precedence are assumed 
to be performed first» Such a precedence scheme is assumed by STUDENT in 
determining an interpretation of similar ambiguous English expressions. 

Once inside the model, however, arithmetic operations and arguments are ful.-.y 
parenthesized, and therefore, order of operations is unambiguous. 

Outline of the Operation of STUDENT 

The first step in the operation of the STUDENT queetion answering system 

is to provide the STUDENT "pragrcm" with some general information wiiick if 
will '’remember*' and use, if relevant, in all problems it is asked to do. 
Through a program called REMEMBER, this information becomes part of the per 
maaent store of knowledge in tha system and is what I call global information 

examples of global facts which have been given to the STUDENT „• - 


« L 
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M 12 inches equals 1 foot" 


or 

"twice always means 2 times 

How these facts are used *11 he described Jeter. 

. aec OTDBHT is asked to solve a particular prohlea. Xo do th s, 

it transforms the English statement of the problem into the uedra of the 

relational model, and then manipulates objects in the model to find t e 

ansver. More specifically, SHW transforms the English input into a 

. , n5n . a ll3t of what answers are required, 

set of simultaneous equations, helping 

_ # dollars* pounds) and a list of all the 

a list of the units involved ,e, £ o, dol » P 

variables in the eductions, Xheo STUB® invokes the SOLVE routine - solve 

thl , get of eouations for the desired unknowns, « a solution is fcond, 

, ,S. of the unknowns requested in the format ii«- 

SWEEHr prints the value. - Ti . t e 

, Iter l.e,, substituting in "(variable is vjluel the s P: P 

eSI ’ „ „ sciulloa cannot be found, "r,ous 

phrases for jgriable sr.d value, «•- t „ . slightly dif- 

heuristios are used to identify two variables ,1.- • - ' 

- T f t® variables, A and B, 

, ♦.Kttf rpfer to the same numDcir i = Ji 

ferent phrases that refer to 

, .. . u. r t-vp of equations,. Keie. 

erc identified the equation A “ B is added t. the . ' 

. to th9 £to -„ of global infom.ti.on t< find any relevan- e, 
ence is made to the St-o-e & 

ti0tlS * o' van a > os, and the possibly 

Assumptionr. made about the idenei > 

Hml , retrieved by STUDElfi av» printed out= U uSe 
relevant stored equations retrieved oy 

Of these identi^U or *U l«Hi W » 8 ^ ‘ “ 
result is printed out in the forsmt described above. 

„ . solution was not found, and certaiv idioms are present • 

geglleh statement of the problem, then . substitution is made for each o. 
Cose idioms in turn, and the transformation and solution processes «- 
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repeated. If it is unsuccessful with all single substitutions STUDIOJI 
reports the failure and terminates. If the problem is ever solved, the 
solution is printed and the program terainates. 


information of the English 

She words and phrases (strings of words) in the English input can be 
classified into three distinct categories on the basis of how they are hen* 
died in the transformation. She first category consists of strings of words 
which denote objects in the model; I call such strings, variables. Varl- 
ables are identified only by the string of words in them, and if two strings 
differ at all. they define distinct variables. One important problem con« 
sidered below is how to determine when two distinct variables refer to the 

same object, 

the second class of words and phrases are what I call "substitutors' . 
Each substitutor may be replaced by another string. Some substitutions nr* 
mandatory; others are optional and are only nade if the problem cannot be 
solved without such substitutions. An example of a mandatory substitution 
is *‘2 times” for the word "twice”. "Twice" always means "2 times" in the 
context of the model, and therefore this substitution is always made. One 

optional "idiomatic” substitution is "twice the sura of the length and widti 
of the rectangle" for "the perimeter of the rectangle". The use of these 

substitutions in the transformation process is discussed below. 

Members of the third class of words and phrases indicate the relation* 
ships between the objects in the model, i.e., the variables in the problem, 
I call members of this third class "operators". Operators may indicate 
operations which are complex combinations of the basic relationships. One 
simple operator is the word^lus* which indicates the operation of addition 
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A complex operator is the phrase "percent Leas than", as in "10 percent 
less than the marked price", t:hich locates the number tnaeoiately preceding 
"percent", subtracts it from 100, divides th-s result by 100, and then 
multiplies this quotient by the variable following the “than". 


She class of operators may be further subdivided according to where 


the arguments of the operators ere found A prefix operator, such as 
"the square of..," precedes its argument. An operator like "percent" 
is a suffix operator, and follows its argument. Infix operators such as 
.« ooc plus ..." or "... less than appear between their two arguments. 

In a split prefix operator such as "difference between ... and ... M , part of 
the operator precedes, and part appears between the two arguments, "The sum 
of , 0 . and ... and ..." is a split prefix operator with an indefinite num» 


ber cf arguments. 

Some words may conditionally act as operators, depending on their con 
text! For example, "oF is only equivalent ho "times" if there is a number 
immediately preceding It; e.g. f ".5 of the profit" is equivilent to ".5 times 
the profit"; hovmver, "Queen of England" doen not imply a multiplicative 
relationship between the Queen and her country. 

Let us now consider in detail the transformation procedure used by 
STUDENT and see how these different types of phrases internet. To make the 
process more concrete, let us consider the following example which has been 

solved by STUDENT, 


(THE PROBLEM TO BE SOLVED IS) 

( iT the HUMBER 0? CUSTOMERS TOM GETS JS TWICE THE SQUARBOP 20 PER 
CENT OP THE HUMBER OF ADVERTISEMENTS HE FXWS, AND THE HUMBER OF 
ADVERTISEMENTS HE RUNS IS 45, WHAT XS THE NUMBER OF CUSTOMERS TOM 


GETS Q.) 
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This text is a copy cf actual printout fron the program, showing stages 
in the transformation and the solution of the problem., The parentheses are 
an artifact of the LISP programming language, and "Qc", is a replacement for 
the question mark not available on the key punch. 

The first stage in the transformation is to perform all mandatory sub¬ 
stitutions, In this problem, only the three phrases underlined (single 
words are one word phrases) are substitutors? "twice" becomes "2 times", 
"per cent" becomes the single word "percent", and "square of" is truncated 
to "square". Having made these substitutions, STUDENT prints? 

(WITH MANDATORY SUBSTITUTIONS THE PROSLIM IS) 

(2P THE NUMBER OF CUSTOMERS TOM' GETS IS THE SfigAgg 20 PERCENT 

OF THE NUliSER OF ADVERTISEMENTS HE RUSS, AND THE NUMBER OF ADVERTISE- 
MEF1S HE RUNS IS 45, WHAT IS THE NUMBER OF CUSTOMERS TOM GETS Q.) 

Figure 3 


Using dictionary entries for each word, the words in the problem are 


now tagged by their function in terms of tie transformation process, and 
STUDENT prints? 


(WITH WORDS TAGGED Wt FUNCTION THE PROBLEM IS) 

(IF THE NUMBER (OF / OP) CUSTOMERS TOM (GETS / VERB) IS 2 (TIMES f OP 1) 

THE (SQUARE / OP 1) 20 (PERCENT / OP 2) (OF / OP) THE NUMBER (OF / OP) 
ADVERTISEMENTS (HE / PRO) RUNS, AND THE NUMBER (OF / OP) ADVERTISEMENTS 
(HE / PRO) RUNS IS 45 , (WHAT / QWORD) IS THE DUMBER (OF / OP) CUSTOMERS 
TOM (GETS / VERB) (QMAKK / DIM}) 

Figure 4 


If a word has a tag, or tags, the word followed by followed by the tags. 
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becomes a single unit, ant is enclosed in parentheses. T^ical taggings 
are indicated in Figure 4 "<0P / OP)” indicates that "of" is an operator 

and other taggings show that "gets” is a veto, "times” is an operator of 
level 1 ^operators levels will be explained oaiow), ' square' 1 is an cpc*a“ 
tor of level 1, "percent" is an operator cf Level 2, "he” is a pronoun, 
"what" is s question word, end "QMARK” replacing Q.) is a delimiter of a 
sentence. These tagged words will play the principal role in the remain¬ 
ing transformation to the set of equations implicit in this problem state¬ 


ment. 

The next, stage in the transformation is to break the input sentences 
into "simple sentences". As in the example, a problem nay be stated using 
sentences of greet grammatical complexity; but the final stage of the trano 
formation is only defined on a set of simple sentences. The method adopted 
here to perform this analysis is ad hoc era primitive, but works reasonably 
well because of the limited number of ways in which algebra sfcry problems 
are expressed. This problem of extraction of simple understandable sentence 

occurs in any general language processor. 

The simplification method employed it STUDENT depends on the recursive 
use of format matching. if an input sentence is of the form "if" followed 
by a substring, followed by a comma, a question word and a second substring 
<i.e„, matches the COKTf left half IF * $ + , + $1/QW0RD + $ -> then the 
first substring (between the IF and the comma) is made an independent sen¬ 
tence, and everything following the comma is made a second sentence. In 
the example, this means that the input i3 resolved into the two sentences, 

Vwhere tags are omitted for the sake of brevity}s 

"The number of customers Torn gets is 2 times the square 20 percent of 
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the number of advertisements he rune* and the number of advertisements 
he runs is 45.•' and "What i3 the number of customers Tom gets?" 

This lest procedure effectively resolves a problem into declarative 
assumptions and a question sentence. A second complexity resolved by 
STUDENT is illustrated in the first sentence of this pair- A coordinate 
sentence consisting of tvo sentences joined by a comma immediately fol¬ 
lowed by an "and" (i„e., any sentence matching the CQMTT left half 
$ 4> , + AND + $) will be resolved into the two independent sentences. The 
first sentence above is therefore resolved into two simpler sentences. 

Using these two ad hoc format simplifications, the problem statement 
is put into canonically simple" sentences. For the enable, STUDENT prints 


(THE SIMPLE SENTENCES ARE) 

(THE NUMBER <0? / OP) CUSTOMERS TOM (GETS / VERB) IS 2 (TIMES / OP 1» 
THE (SQUARE j OP 1) 20 (PERCENT' / OP 2 » (OF / OP) THE NUMBER (OF / 0?) 
ADVERTISEMENTS (HE / PRO) RUNS (PERIOD / DIM}) 

(THE NUMBER £OF / OP) ADVERTISEMENTS (JIB 3 PRO) 45 

(PERIOD / DIM)) 

((WHAT / QWOED) IS THE NUMBER (OF / OP j CUSTOMERS TOM (GETS / VERB) 
(QMARK / DLK>) 

Figure 5 


Bach simple sentence is a separate list, i.e», is enclosed in parentheses, 
and each ends with a delimiter (a period or question mark). Bach of these 
sentences can now be transformed directly to its interpretation in the model 
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niced by the STUDENT program. Hew operators can easily be added to the 
program equivalent of this table,, 

In performing the treesformation of a phrase P, a left to right search 
is made for an operator of level 2 (Indicated by subscripts of "OP'* and 2), 
If none is found, a left to right search is made for a level 1 operator 
(indicat ed by subscripts "OP” and 1) t and finally another left to right, 
search for an operator of level 0 (indicated by a subscript “OP” and no 
numerical subscript). If an operator is found, this operator and its con* 
text are transformed as indicated in column 4 in the table a Xf no opera* 
tor is present, delimiters and articles (a, in and the) are deleted and 
the phrase is treated as an indivisible entity, s variable- 

In the example, the first simple sentence is 

(THE NUMBER (OF/OP) CUSTOMERS TOM (SETS/VERB) IS 2 (TIMES/OP 1) THE 
{SQUARE/OP 1) 20 (PERt’EHX /OP 2} (0T/0P) THE NUMBER (OF/OP) 
ADVERTISEMENTS CEE/PRO) RUNS (PER10D/MM)) 

This is of the form "PI is P2", and is treasformed to (EQ^AL PI* P2*). 

PI is "(THE NUMBER (0P/0?> CUSTOMERS TOM (GETS/VERB})"' The occurrence 

of the verb "gets” is ignored because of the presence of the "is” in the 

sentence, meaning "equals”. The only operator found is "(OF/OP/". From 

the table we see that if "of” is immediately preceded by a number (not the 

word ’'number") it is treated as if it were the infix "times"» In this case 

however, "of" is not preceded by a number, the subscript OP Indicating that 

"of" ia an operator is stripped away, and the transformation process is 

repeated on the phrase with "of" no longer acting as an operator. In this 

repetition, no operators are found, and Pi* is the variable 

(NUMBER OF CUSTOMERS TOM (GETS /VERB)). 

To the right of "is" In the sentence Is P2s 

(2 (TIMES/OP 1) THE (SQUARE/OP i> 20 (PSaeSNT/OP 2) fCF f O?> THE NUMbBl 
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(0F/0?) ADVERT!SSMEwT 3 CHS/PRO) EVES (TBRIOD/DIlO} 


Operators in ST 1TDBFT 


Operator 

Precedence 

Context 

Transformation to 


Level 


Interpretation in 
the Model (a) 



PLUS 

2 

PI PLUS P2 

(PLWS,P1*,P2*) 


PMJ83 

0 

PI FLPSS ?2 

(PH*S,P1*,P2») 


MOMS 

2 

PI HiNUS P2 

(PU ( S ,P1* , (MINUS P2*> ) 

(b) 



MINUS P2 

(KINDS P2*> 


KOH5SS 

0 

PI HSSUSS P2 

(PUPS PI* , (MUMS P2*)> 


TIMES 

1 

PI TIKES P2 

(TIKES PI* P2*> 


D11VB& 

1 

PI D vm P2 

(Quotami.pi* pi*j 


SQUARE 

1 

SQUARE PI 

(BEET PI* 2) 

(O 

SQOAREC 

0 

PL SQC'ARED 

C'EXPT PI* 1 2) 


fc: 

0 

?i «•' P2 

(KEPT Pi* P2*> 


UESSTHAN 

2 

PL LESSTHAH ?2 

PL PER & P2 

(plus P2* (Hairs pi*» 
(qpOtiBff pi* (K P2>*> 

(d) 

PER 

0 

PL PER ?2 

(QOOttESS Pi* a ?2K> 

(O 

PERCEtu 

2 

pi K PERCENT P2 

(PI -SK TOO) P2>* 

PERLESS 

2 

PI K PEP.L3SS P2 

CPl(a00-K)/100> ?2) v 

(f) 

SUM 

0 

sm PI AND P 2 and p: 

(PLUS PI* <S W P2 AM ?3)*> (c) 



SiJH PI AND P2 

(PLUS Pl« P2*> 


DIFFERENCE 

0 

DIFFERENCE BETWEEN PI 

AND P2 (PLUS FI* (MINUS ?2-’V* 


or 

0 

K 0? ?2 

(TIMES K P2*> 




el op ?2 

(PI OF P2>* 


ta) If PI 

is a phrase 

f pi? ; indicates its interpretation in the model 0 


*«• 
*•« 
01 

£ 

.a 

two possible 

contexts are indicated 

, they are checked in the 


order 

shown 





(c) SQUARE PI and SUM PI ere idiomatic shortenings of SQUARE OF Pi and 
SUM OF P!c o« 

(d) * outside a parenthesised expression indicates that the entire phrase 
enclosed is to be transformed, 

(e) K is a number 0 

ff. ; aa d - imply -hat the indicated arithmetic operations are actual!:, 
p? rformed 
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Tie first operator found in P2 is PERCENT, an operator at level 2. 
From the table in Figure 6, we see that this operator has the effect of 
divldiig the number immediately preceding it by 100. The ’’PERCENT" is 
remove! and the transformation is repeated on the remaining phrase - in 
the eximple, the ’’,,,20 'PERCENT/OP 2> C0F#0P) ...” becomes 
"... . !000 {OF/CP) 

Crotinuing the transformation, the operators found are, in order, 
TIMES, SQUARE, OP and OF, Each is handled as indicated, in the table. The 
"of” in the context ’ho, .2000 (OF/OP) THE ...” is treated an infix TIMES, 
while it the other occurrence of "OP", the operator marking is removed. 

The re suiting transformed expression for P2 iss 

(TIMES 2 f’EXPT (TIMES .2 (NUMBER OF ADVERTISEMENTS (BE/PRO) RUNS)) 2)) 

T»e transformation of the second sentence of the example is done in 
a similar manner, and yields the equations 

(EQUAL (NUMBER OF ADVERTISEMENTS (HE/PRO * EUIIS) 45) 

Tie third sentence is of the form "Whet #.3 PI?". It starts with a 
quest !m word and is therefore treated specially. A unique variable, a 
single word consisting of an X followed by five integers, is created, and 
the eqtation (EQUAL Xnnnnn PI*] is stored. For this example, the variable 
X00001 was created, and this last simple sentence is transformed to the 
equatl:nu 

(EQUAL X00001 (NUMBER OF CUSTOMERS TOM (GETS/VERB)) 

In addition, the created variable is placed on the list of variables for 
which STUDENT is to find a value. Also, this variable is stored, paired 
with PL, the untransformed right side, for use in printing out the answer. 
If a value is found for this variable, STUDENT prints the sentence (PI is 
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ynlujg ) with the appropriate substitution for value . Figure 7 shows the 
full set of equations, end the printed solution given by STUDENT for the 
example being considered* For ease in solution, the last equations created 
are the first in thi3 lint cf equations* 

(1SE EQUATIONS TO BE SOLVED ARE) 

(EQUAL X00001 (NUMBER OF CUSTOMERS TOM iOES S / VERB))) 

(EQUAL (NUMBER OF ADVERTXSEMBHES (HB / PRO) BUMS) 45) 

(EQUAL (NUM3ER OF CUSTOMERS TOM (GETS / VERB)) (TIMES 2 (EX?1 
(TIMES *2000 (NUMBER OF ADVERTISEMENTS (EE / PRO) RUNS}) 2)» 

(T3E NUMBER OF CUSTOMERS TOM GETS IS 162) 

Figure 7 

In the axcmple just shown, the equality relation was indicated by the 
copula "rs". In the problem solved by STUDENT shown in Figure 8 below, 
equality is indicated by the occurrence o;: a transitive verb in the proper 
context. 

(13E PROBLEM TO BE SOLVED IS) 

(T3M HAS TWICE AS YJSM F:SE AS MAM HAS GUPPIES . IF HAST? HAS 
3 3UIPIES , vJHAT IS THE HUMBEP. OF FIRE TOM HAS Q.) 

(THE EQUATIONS TO BE SOLVED ARE) 

(EQUAL X0C001 (HOCBER OF FISH TOM (MS / VERB))) 

(EQUAL (NUMBER OF GUPPIES (MARS / PERSON) (HAS / VERB!.) 3) 

(EgOAL (NUMBER OF FISH TOW (HAS / VERB)) (TIMES 2 (NUMBER OF 
GtPPIES (MAM / PERSON) (BAS / VERB)))) 

(THE NUMBER OF FXSH TOM HAS IS 6) 

Figure 8 

The verb in this case is "has".- The simple sentence "Mary has 3 guppies" 
is transformed to the "equivalent" sentence "The number of guppies Mary 
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has is 2" and the processing of this latter sentence is done as previously 
discussed* 

The general format for this type of sentence, and the format of tha 
intermediate sentence to which it is transformed is best expressed by the 

followirg COMXT transformation rules 

$ - $1/VERB + $1/NUMBER + S * THE + NUMBER + OP + 4+1 + 2 + IS + 3 
This may be read as-anything (a subject) followed by a verb followed by 
a number followed by anything (the unit) is transformed to a sentence 
starting with "THE NUMBER OF" followed by tha unit, followed by the subject 
and the verb, followed by "IS" and then the number. In "Mary has 3 guppies*' 
the sub ect is "Mary", the verb "has", and the units "guppies". Similarly, 
the sentence "The witches of Firth brew 3 magic potions" would be trans¬ 
formed to 

"The number of magic potions the witches of Firth brew is 3." 

Xn addition to a declaration of number, u single object transitive 
verbs may be used in a comparative structure, such as exhibited in the 
Sentence "Tom has twice as many fish ae Mary has guppies." The COHXT 
rule wh;.ch gives the effective transformation for this type of sentence 
structure iss 

$ •- §1/VERB + $ + AS + MANT + $ + AS + $ + $1/VERB + $ * 

TEE + NUMBER + OF + 6 + 1 + 2 + IS + 3 + THE + NUMBER + OF 
+ 10 + 8 + 9 

For the example, the transformed sentence is: 

"The number of fish Tom has is twice the number of guppies Mary has," 

Transformation of new sentence formats to formats previously "under¬ 
stood" '<>y the program can be easily added to the program, thus extending 

ci c subset of English "understood" by STUDENT. In the processing ui&z 
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actually takes place within STUDENT the intermediate sentence never exists . 
It is er.8ier to go directly to the model £rom the format, utilizing sub¬ 
routines: previously defined in terms of the semantics of the model. 

Thx word "is" indicates equality onl> if it is not used as an auxili¬ 
ary. The example in Figure 9 shows how verbal phrases containing "is", such 
as "is multiplied by", and "is increased by" ara handled ir. the transforms- 

I 

tion. 

(TIE PROBLEM TO £E SOLVED IS) 

(A NUMBER. IE MULTIPLIED W. 6 . THIS PRODUCT IS INCREASED BY 44 . 
m s RESULT IS 68 . FIND THE NUMBER .) 

(TIE EQUATIONS TO BE SOLVED ARE) 

.(EQUAL X00001 (NUMBER.) ) 

(El ;UAL (PLUS (TIMES (NUMBER) 6) 44) 65) 

(Ti E NUMBER TS 4) 

Figure S 

The sentence "A number is multiplied by 6" only indicates that two 
objects in the model ore related raaltiplicat Lvely, and does r.ot indicate 
explicitly any equality relation. The interpretation of this sentence in 
the raodwl is the prefix ncCation product: 

(TIKES (HUMBERj 6) 

This lai ter phrase is stored in a temporary location for possible later 
reference. In this problem, it is referenced in the next sentence, with 
the phrise "THIS PRODUCT”. The important word in this last phrase is 
"THIS"—STUDENT ignores all words in c variable containing the key word 
"THIS". The last temporarily stored phrase is substituted for the "this" 
variable. Thus, the first three sentences in the problem stated in Figure 
9 yield only one equation, after two substitutions for "this" phrases. The 
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last sentence "Find the number." Is transformed as if it were "Vhat is the 
number Q."j and yields the first equation shown. 

The word "this" may occur in a context where it is not referring to 
a previously stored phrase. In Figure 10 is an example of such a context, 

(THE PROBLEM TO BE SOLVED IS) 

(IBB PRICE OF A RADIO IS 69.70 DOLLARS . IF THIS PRICE IS 
15 PERCENT LESS THAI** THE MARKED PRICE , FIND THE MARKED PRICE 
.) 

(THE EQUATIONS TO BE SOLVED ARE) 

(EQUAL X00001 (MASKED PRICE)) 

(EQUAL (PRICE OF RADIO) (TUBS .8499 (MARKED PRICE))) 

<IQ«AL (PRICE OF RATIO) (TIMES 69.70 (DOLLARS))) 

(THE MARKED PRICE IS 82 DOLLARS) 

Figure 10 

In such contexts, the phrase containing "this" is replaced by the left-half 
of the last equation created. In the example, the phrase "this price" is 
replaced by "the price of a radio"» 

The problem in Figure 10 illustrates two other features of the STUDENT 
progran. The first is the action of the complex operator "percent less than". 
It causes the number immediately preceding it, l.e„, 15, to be subtracted 
from 100, this result divided by 100, to give .85 (.8499 due to rounding 
errors in conversion). Then this operator becomes the infix operator "TIMES". 
This is as indicated in the table in Figure 6. 

This problem also illustrates how units such as "dollars" ars handled 
by the STUDENT program. Any word which immediately follows a number is 
labelled as a special type of variable called a unit. A number followed 
by a unit is treated in the equation as a product of the number and the 

unit, e.g., "(TIMES 69.70 (DOLLARS))". Units are treated specially in solrinn 
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the set of equations, in that any unit may appear in the answer, If the 
value for a variable found by the solver In the product of a number and 
a unit, STUDENT concatenates the number and unit. For example, the solu¬ 
tion for "(MASKED PRICE)” in the problem in Figure ID was {TIMES 82 (DOLLARS)) 
and STUDENT prints out 

"(THE MARKED PRICE IS 82 DOLLARS) 

There is an exception to the fact that cay unit may appear in the 
answer, as illustrated in Figure 11. 

(THE PROBLEM TO BE SOLVED IS) 

(IF 1 SPAN EQUALS 9 INCHES , AND 1 FATHOM EQUALS 6 FEET , BOW 
MAN? SPANS EQUALS 1 FATHOM Q.) 

(THE EQUATIONS TO BE SOLVED ARE) 

(EQUAL X00001 (TIMES 1 (FATHOMS))) 

(EQUAL (TIMES 1 {FATHOMS)) (TIMES 6 (FEET))) 

(EQUAL (TIMES 1 (SPANS)) (TIMES 9 (LKLMS»> 

UNABLE TO SOLVE THIS SET OF EQUATIONS 

(USING THE FOLLOWING KNOWN PJSLATXORSHIPS) 

((EQUAL (TIMES 1 (YARDS)) (TIMES 3 (FEET)}) (EQUAL (TIMES 1 
(FEET)) (TIMES 12 (INCHES)))) 

(1 FATHOM IS 8 SPANS) 

Figure 11 

If, as in the problem in Figure 11, the unit of the answer ia specified--* 
by the phrase "how many spans "—then only that unit, in this caae spans, 
may appear in the answer. Without this restriction, STUDENT would blithely 
answer this problem with "(1 FATHOM IS 1 FATHOM)", 

Xn the tranaformation from the English statement of the problem to 
the equations, 9 inches become (TIMES 9 (INCHES)). However, 1 fathom 
became (TIMES 1 (FATHOMS)), The plural forr. for fathom has been subpti- 
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tuted for the singular fcrrn. SWDEffi. always uses the plural form if known, 
to ensire that all units appear in only one form. Since "fathom" and 
"fathona" are different STUDSS&* would treat them as distinct, unrelated 
units. The plural form is part of the global information that can be made 
available to STUOEKT and the plural form of a word is substituted for any 
singultr form appearing after "1" in any ?hraseThe inverse operation is 
carried out to perform correct printout of the solution, 

Kctice that the information given in the problem was insufficient to 
allow solution to the set of equations to be solved. Therefore, STIJDEKT 
looked in its glossary for information concerning each of the units in 
this set of equations., Xt found the relationship "1 foot equals 12 inches". 
Using this fact, and the equation it implies, STUDENT i3 able to solve the 
probles„ Thus, in certain cases where a problem is not "analytic", in the 
sense that it does not contain, explicitly stated, all the information 
needed for its solution, STUBELT io able to draw on a body of facts, pick 
out relevant ones, and use these to obtain a solution. 

There is another class of problems which I call semi*analytic, la 
such pioblems, the transformation process does not yield a set of solvable 
equations. However, la this set of equations there exists a pair of vari¬ 
ables (or ©ora than one pair) such that the two variables are only "slightly" 
different, and really name the same object In the model. When a set of 
equations is unsolvable STUDEHT searches for relevant global equations. 

In addition, it uses several heuristic techc.iquec for identifying two slightly 

different variables in the equations. The problem in Figure 11 illustrates 
identification of two variables where in one variable a pronoun has been 
substituted for a noun phrase in the other variable. This identification 
is made by checking all variables appearing before one containing the pr. 
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noun, and finding one which is identical to this pronoun phrase, with a 
(THE PROBLEM TO BE SOLVED IS) 

(THE NUMBER OF SOLDIERS TH3 RUSSIANS HAVE IS ONE HALF OF THE 
NUMBER OF GUNS THEY HAVE . THE NUMBER CF GOSS THEY HAVE IS 
70)0 . WHAT IS THE NUMBER OF SOLDIERS TSE? HAVE Q) 

(Ti!E EQUATIONS TO BE SOLVED AREj 

(EQUAL X00001 (NUMBER OF SOLDIERS ClM / PRO) (HAVE / VERB))) 

(EQUAL (NUMBER OF GTOIU (THEY / PRO) (HAVE / VERB)) 7000) 

<E(|UAL (WEER OF SOLDIERS RUSSIANS (HAVE / VERB)) (TIMES *5000 
(EMBER OF GUNS (THEY / PRO) (HAVE / VERB)))) 

UN.ABLE TO SOLVE THIS SET OF EQUATIONS 

(ASSUMING THAT) 

CC1UMBER OF SOLDIERS (THEY / PRO) (HAVE / VERB)) IS EQUAL TO 
CUMBER OF SOLDIERS RUSSIANS (HAVE / VERB))) 

(TIE NUMBER OF SOLDIERS TAM HAVE IS 3500) 

Figure 12 

substitution of & string any length for the pronoun c If two variables 
match ii this fashion, STUDENT assumes the two variables are equal, and 
prints out a statement of this assumption, as shown. The solution pro¬ 
cedure :.s then tried again, with the additional equations from identifica¬ 
tions. In the example, the additional equation was sufficient to deter¬ 
mine the solution. 

The example in Figure 13 is again a ,: non-analytic" problem. The 
first set of equations developed by STUDEFT is unsolvable. Therefore, 
STUDENT tries to find some relevant equations in its store of global 
information. It uses the first word of each variable string as a key to 
its glossary. The one exception to this rule is that the words "number of" 
are ignired if they are the fir3t two vord3 of a variable string. Thus, 
in this problem, STUDENT retrieved aquations which were stored under the 
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key words distance , gallons , ^a-3, and miles . Tvo facts about distance 
had been stored earlier; '‘distance equals speed times time" and "dis¬ 
tance equals gas consumption times number of gallons of gas used". The 
equations implicit in these sentences were stored and retrieved now—as 
possibly relevant to the solution. Xn fact, only the second is relevant. 


(THE PROBLEM TO BE SOLVED IS) 

(THE GAS CONSUMPTION OF MY CAR IS 15 MILES PER GALLON c THE 
DISTANCE BETWEEN BOSTON AND NEW YORK IS 250 MILES » WHAT IS 
THE NUMBER OF GALLONS OF GAS USED ON A TRIP BETWEEN NEW YORK 
AND BOSTON Qo> 


(THE EQUATIONS TO BE SOLVED ARE) 

(EQUAL X00001 (NUMBER OF GALLONS OF GAS USED ON TRIP BETWEEN 
NEW YORK AND BOSTON)) 

(EQUAL (DISTANCE BETWEEN BOSTON AND NEW YORK) (TIMES 250 (MILES))) 

(EQUAL (GAS CONSUMPTION OF MY CAR) (QUOTIENT (TIMES 15 (MILES)) 
(TIMES 1 (GALLONS)))) 


UNABLE TO SOLVE THIS SET OF EQUATIONS 
(USING THE FOLLOWING KNOWN RELATIONSHIPS) 

((EQUAL (DISTANCE) (TIMES (SPEED) (TIME))) (EQUAL (DISTANCE) 
(TIMES (GAS CONSUMPTION) (NUMBER OF GALLONS OF GAS USED)))) 

(ASSUMING THAT) 

((DISTANCE) IS EQUAL TO (DISTANCE BETWEEN BOSTON A HQ NEW YORK)) 
(ASSUMING THAT) 

((GAS CONSUMPTION) IS EQUAL TO (GAS CONSUMPTION OF MY CAR)) 
(ASSUMING THAT) 

((NUMBER OF GALLONS OF GAS USED) IS EQUAL TO (NUMBER OF GALLONS 
OF GAS USED ON TRIP BETWEEN NEW YORK AND BOSTON)) 


(THE NUMBER OF GALLONS OF GAS US iU ON A TRIP BETWEEN NEW YORK 
AN D BOSTON IS _16=66 GALLONS) 


Figure 13 
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Before any attempt is made to solve this augmented set of equations, 
the variables of the augmented set are matched, to identify "slightly 
different" variables which refer to the seme object in the model. In 
this example "(DISTANCE)", "(GAS COHSIBPItOH)" and "<NlM3Ea OF GALLONS OF 
GAS CSBD)", are all identified with "similar" variables. The following 
heuristically determined conditions must be satisfied for identification 
of variables Pi and P2. 

1) PI must appear later In the problem than P2. 

2) PI is completely contained in P2 in the sense that PI is a con* 
tiguous substring within P2. 

This identification reflects a syntactic phenomenon where a truncated 
phrase, with one or more modifying phrases dropped, is often used in place 
of the entire phrase. Fcr example, if the phrase "the length of a rec¬ 
tangle" has occurred, the phrase "the length" may be used to mean the same 
thing. This identification is distinct from that made using pronoun sub¬ 
stitution. 

In the example in Figure 13, a schema is used by identifying the vari¬ 
ables in the schema with the variables that occur in the problem. This 
problem is solvable exactly because the key phvase3 "distance", "gas 
consumption" and "number cf gallons of gas used" occur as substrings of 
the variables in the problem. Since STUE2NT identifies each generic key 
phrase of the schema with a particular variable of the problem, any scheme 
can be used only once in a problem. Because STUDENT handles schema in thie 
ad hoc fashion it cannot solve problems its. which a relationship such as 
"distance equals speed times time" is needed for two different values of 
distance, speed, and ties. With some effort, this weakness in the program 


could be overcome. 
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(THE PROBLEM TO BE SOLVED IS) 

(THE LENGTH OF A RECTANGLE IS 8 1NCFES MORE THAN THE WIDTH 
OF THE RECTANGLE . ONE HALF OF THE PERIMETER OF THE RECTANGLE 
IS 18 INCHES . FIND THE LENGTH AND THE WIDTH OF THE RECTANGLE 

o 

(THE EQUATIONS TO BE SOLVED ARE) 

(EQUAL X00001 (WIDTH OF RECTANGLE)) 

(EQUAL XC0002 (LENGTH)) 

(EQUAL (TIMES .5000 (PERIMETER OF RICTANGLE)) (TIMES 18 (INCHES))) 

(EQUAL (LENGTH OF RECTANGLE) (PLUS (TIMES 8 (INCHES)) (WIDTH 
OF RECTANGLE))) 

UNABLE TO SOLVE THIS SET OF EQUATI01 S 
TESTING POSSIBLE IDIOMS 

(THE PROBLEM WITE AN IDIOMATIC SUBSTITUTION IS) 

(THE LENGTH OF A RECTANGLE IS 8 IHCEES MORE THAI! THE WIDTH 
OF THE RECTANGLE . ONE HALF OF TWICE THE SUM OF THE LENGTH 
AND WIDTH OF THE RECTANGLE IS 18 INCHES . FIND THE LENGTH AND 
THE WIDTH OF TEE RECTANGLE .) 

(THE EQUATIONS TO BE SOLVED ARE) 

(EQUAL X00003 (WIDTH OF RECTANGLE)) 

(EQUAL X0Q004 (LENGTH)) 

(EQUAL (TIKES (TIMES .5000 2) (PLUS (LENGTH) (WIDTH 0? RECTANGLE))) 
(TIMES 18 (IHCEES))) 

(EQUAL (LENGTH OF RECTANGLE) (PLUS (TIMES 8 (INCHES)} (WIDTH 
OF RECTANGLE))) 

UNABLE TO SOLVE THIS SET OF EQUATIONS 

(USING THE FOLLOWING KNOWN RELATIONSHIPS) 

((EQUAL (TIMES 1 (FEET)) (TIMES 12 (INCHES)))) 

(ASSUMING THAT) 

((LENGTH) IS EQUAL TO (LENGTH OF RECTANGLE)) 

(THE LENGTH IS 13 1MCHES) 

(THE WIDTH OF THE RECTANGLE IS 5 INCHES) 


Figure 15 
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Special Heuristics 

The methods thus far discussed have been applicable to the entire range 
of algebra problems. However, for special classes of problems, additional 
heuristics may be used vdiich are needed for members of the class, but not 
applicable to other problems. An example is the class of age problems, as 
typified by the problem in Figure 16. 

Before the age problem heuristics are used, a problem must be identi¬ 
fied as belonging to that class. STUDENT identifies age problems by any 
occurrence of one of the following phrases, "as old as", "years old" and 
"age". Tills identification is made immediately after all words are looked 
up in the dictionary and tagged by function. After the special heuristics 
are used the modified problem is transformed to equations exactly as stated 
previously. 

(THE PROBLEM TO BE SOLVED IS) 

(Blli S FATHER S UNCLE IS TWICE AS OLD AS BILL S FATHER . 2 

TEARS FROM ROW BILL S FATHER WILL BE 3 TIMES AS OLD AS BILL 

. THE SUM OF THEIR ACES IS 92 . FIND BILL S AGE .) 

(THE EQUATIONS TO BE SOLVED ARE) 

(EQUAL X0001 ((BILL ( PERSON) S AGE)) 

(EQUAL (PLUS ((BILL / PERSON) S (FATHER / PERSON) S (UNCLE 

/ PERSON) S AGE) (PLUS ((BILL / PERSON) S (FATHER / PERSON) 

S AGE) ((BILL / PERSON) S AGE))) 92) 

(EQUAL (PLUS ((BILL / PERSON) S (FATHER / PERSON) S AGE) 2) 

(TIMES 3 (PLUS ((BILL / PERSON) S AGE))) 

(BILL S AGE IS 8) 

Figure 16 

The need for special methods for age problems arises because of the 
conventions used for denoting the variables, all of which are ages. The 
word age is usually not used explicitly, but is implicit in such phrases 
as "as old as". People's names are used where their ages are really the 
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implicit variables. In the example, for instance, the phrase "Bill’s 
father's uncle” is used to refer to "Bill's father's uncle's age”. 

STUDENT uses a special heuristic to make all these ages explicit. To 
do this, it must know which words are "person words" and therefore, may be 
associated with an age. For this problem STUDENT has been told that Bill, 
father, and uncle are person v;ords. They can be seen tagged as such in the 
equations. The "s" following a word is the STUDENT representation for 
possessive, used instead of "apostrophe - s" for programming convenience. 
STUDENT inserts a ! 'S AGE" after every person word not followed by a "S" 
(because this "S" indicates that the person word is being used in a pos¬ 
sessive sense, not as an independent age variable). Thus, as indicated, 
the phrase "BILL S FATHER S UNCI.2" becomes "BILL S FATHER S UNCLE S AGE". 

In addition to changing phrases naming people to ones oam ing ages, 
STUDENT makes certain special idiomatic substitutions. For the phrase 
"their ages", STUDENT substitutes all the age variables encountered in 
the problem. In the example, for "THEIR AGES" STUDENT substitutes "BILL S 
FATHER S UNCLE S ACS AKD BILL S FATHER S AGE AND BILL S AGS". The phrases 
"as old as" and "years old" are then deleted as dummy phrases not having 
any meaning, and "will be" and "was" are changed to "IS". There is no 
need to preserve the tense of the copula, since the sense o.: the future 
tense i3 preserved in such prefix phrases as "2 years from now". 

The remaining special age problem heuristics are used to process the 
phrases "in 2 years", "5 years ago" and "now". The phrase "2 years from 
now" is transformed to "in 2 years" before processing. These three time 
phrases may occur immediately after the word age, (e.g., "Bill's age 3 years 
ago") or at the beginning of the sentence. Xf a time phrase occurs at the 
at the beginning of the sentence, it implicitly modifies all ages Tver.tier, id 
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sentence becomes the two sentences? "Mary is twice as old as Ann X00007 
years ago. X0Q007 years ago Mary was as cld as Ann is now." These two 
occurrences of time phrases are handled as discussed previously. Similarly 
the phrase ,f will be when" would be transformed to "in K years . In It years". 

These decoupling heuristics are useful not only for the STUDENT program 
but for people trying to reive age problems. The classic age problem in 
Figure 17 took an MIT graduate student over 5 minutes to solve because he 
did not know this heuristic. With the heuristic he was able to set up the 
appropriate equations much more rapidly. As a crude measure of STUDENT : s 
relative skill, note that STUDENT took less than one minute to solve the 
problem in Figure 17. 

Global Information 

This algebra problem-solving system contains two programs which pro¬ 
cess English input. One is the program thus far discussed, STUDENT, which 
accepts the statement of an algebra story problem and attempts to find the 
solution to the particular problem. STUDSfT does not store any information, 
nor "remember" anything from problem to problem. 

The other program is called REMEMBER and it processes and stores facts 
not specific to any one problem. These facts make up STUDEHT's store of 
"global information". This information is accepted in a subset of English 
which overlaps but is different from the subset of English accepted by STUDENT. 
REMEMBER accepts statements in certain fixed formats. The following are 
the formats currently understood, and the method of storage of the informa¬ 
tion obtained in each format. 

1) Example? Distance equals speed times time. 

Format: PI equals P2. 
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Processings The sentence is transformed into an equation in the 
same way it is done in STJDEKT. This equation is stored on the property 
lists of the atoms v#\lch are the first words in each variable. In the 
example, the equation 

"(EQUAL (DISTANCE) (TIMES (SPEED' (TOffi)))" 
is stored on the property list of " DISTANCE", "SPEED" and "TIME", This 
equation will be retrieved if needed and cne of these words appears in 
the problem. 

2) Example; Times is an operator of level 1. 

Format: PI i3 an operator of level K. 

Processing: A dictionary entry lor PI is created with subscripts 
of 0? and K. For TIKES, the dictionary ertry (TlMBS/0? 1) is created. 
These entries are those that are used to determine the tagging of words by 
function. 

3) Example: OF is an operator. 

Format: Pi is an operator. 

processing: Creates a dictionary entry for PI with the subscript 
OP. The entry for OF is (’OF/OP). 

4) Example; Bill is a person. 

Format; PI is a P2. 

Processing; Creates a dictionary entry for Pi, with P2 as a sub¬ 
script.' The entry for 3XLL is (BXLL/PERSCN). 

5) Example; Feet is the plural of foot. 

Format; PI is the plural of P2. 

Processing; On the property list of PI, after the flag SING, P2 
is stored; on the property list of P2, after the flag PLURAL, the word PI 

is stored. Thus FEET is stored after PLUML on the property list of the 
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atom FOOT. 

6) Example: One half aivjsys means 0.5. 

Format: PI always mesas P2. 

Processing: The. program STUDENT is actually modified so that if 
Pi occurs, the mandatory substitution of P2 fox PI will be made. The last 
sentence of this format processed by RiS&^RER will be the first mandatory 
substitution made. Thus ’"one always means 1" followed by "one half always 
means 0.5" will cause the desired substitutions to be made; if these sen¬ 
tences were reversed no occurrence of "one half" would ever be found since 
it would have been charged to "1 half". 

7) Example: Two numbers sometimes sear.3 one number and the other 
number. 

Format: PI sometimes means P2. 

Processing: The STUDENT program is modified so that the possible 
idiomatic substitution of P2 for PI will oe made in a problem if it is 
otherwise unsolvable. All such "possible idiomatic substitutions" are 
tried when necessary, with the last one entered being the first one tried. 

These last two formats actually insert new METEOR program statements 
into STUDENT. If PI and P2 are METEOR left and right halves respectively, 
using special METEOR features, these formats can be used to extend the sub¬ 
set of English understood by STUDENT. 

Conclusion 

The STUDENT program accepts as input an algebra story problem couched 
in a restricted subset of English. It tries to solve this problem by map¬ 
ping the input into a relational model and manipulating structures in the 
model. It uses general information also entered in English, and suatee in 
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English any assumptions and facts used to find the solution to the problem 
given. If the solution is found, it is also communicated in English. 

STUDENT could be extended in many ways within the framework developed. 
For example, much more sophisticated sentence parsing routines could be 
used. This would enable STUDENT to extract "simple sentences" from much 
more complicated gransxitical structures. In addition, it might help in 
the identification of e phrase which is a paraphrase of another. This 
identification is the most difficult task STUDENT attempts, and the methods 
used are heuristic and ad hoc . Better methods making more use of the mean¬ 
ing of the words in the phrases should be found. Because of the current 
method of identifying variables, a stored schema or equations can be used 
or.ly once. This should be improved. 

I think we are far from achieving a program which understands all of 
English. However, within its limited erec of competence, STUDENT has 
demonstrated that it ha3 a good "understanding" of English, and I think 
that limited programs such as this may actually prove to be useful tools 
in their own right. 



CS-TR Scanning Project 
Document Control Form 


Date: II i 30 n " 


Report # f\\^ £4- 

Each of the following should be identified by a checkmark: 

Originating Department: 

Artificial Intellegence Laboratory (Al) 

□ Laboratory for Computer Science (LCS) 

Document Type: 

□ Technical Report (TR) Technical Memo (TM) 

□ Other: _ 

Document Information Number of pages: 

Not to include DOD forms, printer intstructions, etc... original pages only. 

Originals are: Intended to be printed as . 

X Single-sided or . ^ Single-sided or 

□ Double-sided ^ Double-sided 

Print type: 

Q Typewriter 0 Offset Press 0 Laser Print 

0 InkJet Printer 0 Unknown Other.-_ 

Check each if included with document: 

□ DOD Form □ Funding Agent Form □ Cover Page 

□ Spine D Printers Notes D Photo negatives 

□ Other:___ 

Page Data: 


Blank Pageso>yp»8«numb«).___ 

Photographs/Tonal Material (by pag* numb«i). 


Other (note d**cnpbon/p«g* number). 


Description . 


Page Number 

AOflf 1 ft ~3<Q TTTll TbdrKj ^ 3 £- 

(t->- Vo) 5c.AAir ~nA3tT3oL ; ^'ft^rr^f3)._ 


Scanning Agent Signoff: 

Date Received: // / 3vi^5 Date Scanned: /X/ ^ / 9s 


Date Returned: & / "7 / 




Scanning Agent Signature:. 


Rev 9/94 DS/LCS Document Control Form cstrform.ved 



Scanning Agent Identification Target 


Scanning of this document was supported in part by 

the Corporation for National Research Initiatives, 
using funds from the Advanced Research Projects 
Agency of the United states Government under 
Grant: MDA972-92-J1029. 


The scanning agent for this project was the 
Document Services department of the M.I.T 
Libraries. Technical support for this project was 
also provided by the M.I.T. Laboratory for 
Computer Sciences. 


Scanned 

Date: 15.1s {mm 

M.I.T. Libraries 
Document Services 


darptrgt.wpw Rev. 9/94 


